Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 75
Filtrar
1.
medRxiv ; 2024 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-38434717

RESUMO

Polygenic risk scores (PRS) are summaries of an individual's personalized genetic risk for a trait or disease. However, PRS often perform poorly for phenotype prediction when the ancestry of the target population does not match the population in which GWAS effect sizes were estimated. For many populations this can be addressed by performing GWAS in the target population. However, admixed individuals (whose genomes can be traced to multiple ancestral populations) lie on an ancestry continuum and are not easily represented as a discrete population. Here, we propose slaPRS (stacking local ancestry PRS), which incorporates multiple ancestry GWAS to alleviate the ancestry dependence of PRS in admixed samples. slaPRS uses ensemble learning (stacking) to combine local population specific PRS in regions across the genome. We compare slaPRS to single population PRS and a method that combines single population PRS globally. In simulations, slaPRS outperformed existing approaches and reduced the ancestry dependence of PRS in African Americans. In lipid traits from African British individuals (UK Biobank), slaPRS again improved on single population PRS while performing comparably to the globally combined PRS. slaPRS provides a data-driven and flexible framework to incorporate multiple population-specific GWAS and local ancestry in samples of admixed ancestry.

3.
medRxiv ; 2024 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-38260294

RESUMO

Venous thromboembolism (VTE) is a significant contributor to morbidity and mortality, with large disparities in incidence rates between Black and White Americans. Polygenic risk scores (PRSs) limited to variants discovered in genome-wide association studies in European-ancestry samples can identify European-ancestry individuals at high risk of VTE. However, there is limited evidence on whether high-dimensional PRS constructed using more sophisticated methods and more diverse training data can enhance the predictive ability and their utility across diverse populations. We developed PRSs for VTE using summary statistics from the International Network against Venous Thrombosis (INVENT) consortium GWAS meta-analyses of European- (71,771 cases and 1,059,740 controls) and African-ancestry samples (7,482 cases and 129,975 controls). We used LDpred2 and PRSCSx to construct ancestry-specific and multi-ancestry PRSs and evaluated their performance in an independent European- (6,261 cases and 88,238 controls) and African-ancestry sample (1,385 cases and 12,569 controls). Multi-ancestry PRSs with weights tuned in European- and African-ancestry samples, respectively, outperformed ancestry-specific PRSs in European- (PRSCSXEUR: AUC=0.61 (0.60, 0.61), PRSCSX_combinedEUR: AUC=0.61 (0.60, 0.62)) and African-ancestry test samples (PRSCSXAFR: AUC=0.58 (0.57, 0.6), PRSCSX_combined AFR: AUC=0.59 (0.57, 0.60)). The highest fifth percentile of the best-performing PRS was associated with 1.9-fold and 1.68-fold increased risk for VTE among European- and African-ancestry subjects, respectively, relative to those in the middle stratum. These findings suggest that the multi-ancestry PRS may be used to identify individuals at highest risk for VTE and provide guidance for the most effective treatment strategy across diverse populations.

4.
bioRxiv ; 2023 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-37961699

RESUMO

Spatial transcriptomics (ST) technologies have advanced to enable transcriptome-wide gene expression analysis at submicron resolution over large areas. Analysis of high-resolution ST data relies heavily on image-based cell segmentation or gridding, which often fails in complex tissues due to diversity and irregularity of cell size and shape. Existing segmentation-free analysis methods scale only to small regions and a small number of genes, limiting their utility in high-throughput studies. Here we present FICTURE, a segmentation-free spatial factorization method that can handle transcriptome-wide data labeled with billions of submicron resolution spatial coordinates. FICTURE is orders of magnitude more efficient than existing methods and it is compatible with both sequencing- and imaging-based ST data. FICTURE reveals the microscopic ST architecture for challenging tissues, such as vascular, fibrotic, muscular, and lipid-laden areas in real data where previous methods failed. FICTURE's cross-platform generality, scalability, and precision make it a powerful tool for exploring high-resolution ST.

5.
Nature ; 622(7984): 784-793, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37821707

RESUMO

The Mexico City Prospective Study is a prospective cohort of more than 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City1. Here we generated genotype and exome-sequencing data for all individuals and whole-genome sequencing data for 9,950 selected individuals. We describe high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Indigenous American, European and African ancestry, with extensive admixture from Indigenous populations in central, southern and southeastern Mexico. Indigenous Mexican segments of the genome had lower levels of coding variation but an excess of homozygous loss-of-function variants compared with segments of African and European origin. We estimated ancestry-specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Indigenous Mexican ancestry at exome variants, all available through a public browser. Using whole-genome sequencing, we developed an imputation reference panel that outperforms existing panels at common variants in individuals with high proportions of central, southern and southeastern Indigenous Mexican ancestry. Our work illustrates the value of genetic studies in diverse populations and provides foundational imputation and allele frequency resources for future genetic studies in Mexico and in the United States, where the Hispanic/Latino population is predominantly of Mexican descent.


Assuntos
Sequenciamento do Exoma , Genoma Humano , Genótipo , Hispânico ou Latino , Adulto , Humanos , África/etnologia , América/etnologia , Europa (Continente)/etnologia , Frequência do Gene/genética , Genética Populacional , Genoma Humano/genética , Técnicas de Genotipagem , Hispânico ou Latino/genética , Homozigoto , Mutação com Perda de Função/genética , México , Estudos Prospectivos
6.
Cell Genom ; 3(8): 100345, 2023 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-37601974

RESUMO

Stroke is the second leading cause of death and disability worldwide. Stroke prevalence varies by sex and ancestry, possibly due to genetic heterogeneity between subgroups. We performed a genome-wide meta-analysis of 16 biobanks across multiple ancestries to study the genetics of ischemic stroke (60,176 cases, 1,310,725 controls) as part of the Global Biobank Meta-analysis Initiative (GBMI) and further combined the results with previously published MegaStroke. Five novel loci for ischemic stroke (LAMC1, CALCRL, PLSCR1, CDKN1A, and SWAP70) were identified after replication in four additional datasets. One previously reported locus showed significant ancestry heterogeneity (ABO), and one showed significant sex heterogeneity (ALDH2). The ALDH2 association was male specific (males p = 1.67e-24, females p = 0.126) and was additionally observed only in the East Asian ancestry (male) samples. These findings emphasize the need for more diverse datasets with large sample sizes to further understand the genetic predisposition of stroke in different ancestry and sex groups.

8.
Nat Commun ; 14(1): 3202, 2023 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-37268629

RESUMO

We assess performance and limitations of polygenic risk scores (PRSs) for multiple blood pressure (BP) phenotypes in diverse population groups. We compare "clumping-and-thresholding" (PRSice2) and LD-based (LDPred2) methods to construct PRSs from each of multiple GWAS, as well as multi-PRS approaches that sum PRSs with and without weights, including PRS-CSx. We use datasets from the MGB Biobank, TOPMed study, UK biobank, and from All of Us to train, assess, and validate PRSs in groups defined by self-reported race/ethnic background (Asian, Black, Hispanic/Latino, and White). For both SBP and DBP, the PRS-CSx based PRS, constructed as a weighted sum of PRSs developed from multiple independent GWAS, perform best across all race/ethnic backgrounds. Stratified analysis in All of Us shows that PRSs are better predictive of BP in females compared to males, individuals without obesity, and middle-aged (40-60 years) compared to older and younger individuals.


Assuntos
Saúde da População , Masculino , Feminino , Humanos , Pressão Sanguínea/genética , Fatores de Risco , Herança Multifatorial/genética , Etnicidade/genética , Estudo de Associação Genômica Ampla , Predisposição Genética para Doença
9.
Nat Med ; 29(6): 1540-1549, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37248299

RESUMO

Preeclampsia and gestational hypertension are common pregnancy complications associated with adverse maternal and child outcomes. Current tools for prediction, prevention and treatment are limited. Here we tested the association of maternal DNA sequence variants with preeclampsia in 20,064 cases and 703,117 control individuals and with gestational hypertension in 11,027 cases and 412,788 control individuals across discovery and follow-up cohorts using multi-ancestry meta-analysis. Altogether, we identified 18 independent loci associated with preeclampsia/eclampsia and/or gestational hypertension, 12 of which are new (for example, MTHFR-CLCN6, WNT3A, NPR3, PGR and RGL3), including two loci (PLCE1 and FURIN) identified in the multitrait analysis. Identified loci highlight the role of natriuretic peptide signaling, angiogenesis, renal glomerular function, trophoblast development and immune dysregulation. We derived genome-wide polygenic risk scores that predicted preeclampsia/eclampsia and gestational hypertension in external cohorts, independent of clinical risk factors, and reclassified eligibility for low-dose aspirin to prevent preeclampsia. Collectively, these findings provide mechanistic insights into the hypertensive disorders of pregnancy and have the potential to advance pregnancy risk stratification.


Assuntos
Eclampsia , Hipertensão Induzida pela Gravidez , Hipertensão , Pré-Eclâmpsia , Gravidez , Feminino , Criança , Humanos , Hipertensão Induzida pela Gravidez/genética , Pré-Eclâmpsia/genética , Pré-Eclâmpsia/prevenção & controle , Aspirina , Fatores de Risco
10.
bioRxiv ; 2023 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-36993375

RESUMO

Understanding the DNA methylation patterns in the human genome is a key step to decipher gene regulatory mechanisms and model mutation rate heterogeneity in the human genome. While methylation rates can be measured e.g. with bisulfite sequencing, such measures do not capture historical patterns. Here we present a new method, Methylation Hidden Markov Model (MHMM), to estimate the accumulated germline methylation signature in human population history leveraging two properties: (1) Mutation rates of cytosine to thymine transitions at methylated CG dinucleotides are orders of magnitude higher than that in the rest of the genome. (2) Methylation levels are locally correlated, so the allele frequencies of neighboring CpGs can be used jointly to estimate methylation status. We applied MHMM to allele frequencies from the TOPMed and the gnomAD genetic variation catalogs. Our estimates are consistent with whole genome bisulfite sequencing (WGBS) measured human germ cell methylation levels at 90% of CpG sites, but we also identified ~ 442, 000 historically methylated CpG sites that could not be captured due to sample genetic variation, and inferred methylation status for ~ 721, 000 CpG sites that were missing from WGBS. Hypo-methylated regions identified by combining our results with experimental measures are 1.7 times more likely to recover known active genomic regions than those identified by WGBS alone. Our estimated historical methylation status can be leveraged to enhance bioinformatic analysis of germline methylation such as annotating regulatory and inactivated genomic regions and provide insights in sequence evolution including predicting mutation constraint.

11.
Cell Genom ; 3(2): 100257, 2023 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-36819667

RESUMO

Biobanks of linked clinical patient histories and biological samples are an efficient strategy to generate large cohorts for modern genetics research. Biobank recruitment varies by factors such as geographic catchment and sampling strategy, which affect biobank demographics and research utility. Here, we describe the Michigan Genomics Initiative (MGI), a single-health-system biobank currently consisting of >91,000 participants recruited primarily during surgical encounters at Michigan Medicine. The surgical enrollment results in a biobank enriched for many diseases and ideally suited for a disease genetics cohort. Compared with the much larger population-based UK Biobank, MGI has higher prevalence for nearly all diagnosis-code-based phenotypes and larger absolute case counts for many phenotypes. Genome-wide association study (GWAS) results replicate known findings, thereby validating the genetic and clinical data. Our results illustrate that opportunistic biobank sampling within single health systems provides a unique and complementary resource for exploring the genetics of complex diseases.

12.
G3 (Bethesda) ; 13(4)2023 04 11.
Artigo em Inglês | MEDLINE | ID: mdl-36759699

RESUMO

Population genetics has adapted as technological advances in next-generation sequencing have resulted in an exponential increase of genetic data. A common approach to efficiently analyze genetic variation present in large sequencing data is through the allele frequency spectrum, defined as the distribution of allele frequencies in a sample. While the frequency spectrum serves to summarize patterns of genetic variation, it implicitly assumes mutation types (A→C vs C→T) as interchangeable. However, mutations of different types arise and spread due to spatial and temporal variation in forces such as mutation rate and biased gene conversion that result in heterogeneity in the distribution of allele frequencies across sites. In this work, we explore the impact of this simplification on multiple aspects of population genetic modeling. As a site's mutation rate is strongly affected by flanking nucleotides, we defined a mutation subtype by the base pair change and adjacent nucleotides (e.g. AAA→ATA) and systematically assessed the heterogeneity in the frequency spectrum across 96 distinct 3-mer mutation subtypes using n = 3556 whole-genome sequenced individuals of European ancestry. We observed substantial variation across the subtype-specific frequency spectra, with some of the variation being influenced by molecular factors previously identified for single base mutation types. Estimates of model parameters from demographic inference performed for each mutation subtype's AFS individually varied drastically across the 96 subtypes. In local patterns of variation, a combination of regional subtype composition and local genomic factors shaped the regional frequency spectrum across genomic regions. Our results illustrate how treating variants in large sequencing samples as interchangeable may confound population genetic frameworks and encourages us to consider the unique evolutionary mechanisms of analyzed polymorphisms.


Assuntos
Genética Populacional , Taxa de Mutação , Humanos , Frequência do Gene , Mutação , Nucleotídeos
13.
Am J Hum Genet ; 109(9): 1582-1590, 2022 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-36055210

RESUMO

For the genomics community, allele frequencies within defined groups (or "strata") are useful across multiple research and clinical contexts. Benefits include allowing researchers to identify populations for replication or "look up" studies, enabling researchers to compare population-specific frequencies to validate findings, and facilitating assessment of variant pathogenicity in clinical contexts. However, there are potential concerns with stratified allele frequencies. These include potential re-identification (determining whether or not an individual participated in a given research study based on allele frequencies and individual-level genetic data), harm from associating stigmatizing variants with specific groups, potential reification of race as a biological rather than a socio-political category, and whether presenting stratified frequencies-and the downstream applications that this presentation enables-is consistent with participants' informed consents. The NHLBI Trans-Omics for Precision Medicine (TOPMed) program considered the scientific and social implications of different approaches for adding stratified frequencies to the TOPMed BRAVO (Browse All Variants Online) variant server. We recommend a novel approach of presenting ancestry-specific allele frequencies using a statistical method based upon local genetic ancestry inference. Notably, this approach does not require grouping individuals by either predominant global ancestry or race/ethnicity and, therefore, mitigates re-identification and other concerns as the mixture distribution of ancestral allele frequencies varies across the genome. Here we describe our considerations and approach, which can assist other genomics research programs facing similar issues of how to define and present stratified frequencies in publicly available variant databases.


Assuntos
Motivação , Medicina de Precisão , Etnicidade/genética , Frequência do Gene/genética , Genômica/métodos , Humanos
15.
Neurogastroenterol Motil ; 34(6): e14236, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-34378841

RESUMO

BACKGROUND: Functional dyspepsia (FD) is a common gastrointestinal condition of poorly understood pathophysiology. While symptoms' overlap with other conditions may indicate common pathogenetic mechanisms, genetic predisposition is suspected but has not been adequately investigated. METHODS: Using healthcare, questionnaire, and genetic data from three large population-based biobanks (UK Biobank, EGCUT, and MGI), we surveyed FD comorbidities, heritability, and genetic correlations across a wide spectrum of conditions and traits in 10,078 cases and 351,282 non-FD controls of European ancestry. KEY RESULTS: In UK Biobank, 281 diagnoses were detected at increased prevalence in FD, based on healthcare records. Among these, gastrointestinal conditions (OR = 4.0, p < 1.0 × 10-300 ), anxiety disorders (OR = 2.3, p < 1.4 × 10-27 ), ischemic heart disease (OR = 2.2, p < 2.3 × 10-76 ), and infectious and parasitic diseases (OR = 2.1, p = 1.5 × 10-73 ) showed strongest association with FD. Similar results were obtained in an analysis of self-reported conditions and use of medications from questionnaire data. Based on a genome-wide association meta-analysis of genotypes across all cohorts, FD heritability was estimated close to 5% ( hSNP2  = 0.047, p = 0.014). Genetic correlations indicate FD predisposition is shared with several other diseases and traits (rg  > 0.344), mostly overlapping with those also enriched in FD patients. Suggestive (p < 5.0 × 10-6 ) association with FD risk was detected for 13 loci, with 2 showing nominal replication (p < 0.05) in an independent cohort of 192 FD patients. CONCLUSIONS & INFERENCES: FD has a weak heritable component that shows commonalities with multiple conditions across a wide spectrum of pathophysiological domains. This new knowledge contributes to a better understanding of FD etiology and may have implications for improving its treatment.


Assuntos
Dispepsia , Gastroenteropatias , Cruzamentos Genéticos , Dispepsia/diagnóstico , Dispepsia/epidemiologia , Dispepsia/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Inquéritos e Questionários
16.
Hum Mol Genet ; 31(19): 3367-3376, 2022 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-34718574

RESUMO

In the era of personalized medicine with more and more patient-specific targeted therapies being used, we need reliable, dynamic, faster and sensitive biomarkers both to track the causes of disease and to develop and evolve therapies during the course of treatment. Metabolomics recently has shown substantial evidence to support its emerging role in disease diagnosis and prognosis. Aside from biomarkers and development of therapies, it is also an important goal to understand the involvement of mitochondrial DNA (mtDNA) in metabolic regulation, aging and disease development. Somatic mutations of the mitochondrial genome are also heavily implicated in age-related disease and aging. The general hypothesis is that an alteration in the concentration of metabolite profiles (possibly conveyed by lifestyle and environmental factors) influences the increase of mutation rate in the mtDNA and thereby contributes to a range of pathophysiological alterations observed in complex diseases. We performed an inverted mitochondrial genome-wide association analysis between mitochondrial nucleotide variants (mtSNVs) and concentration of metabolites. We used 151 metabolites and the whole sequenced mitochondrial genome from 2718 individuals to identify the genetic variants associated with metabolite profiles. Because of the high coverage, next-generation sequencing-based analysis of the mitochondrial genome allows for an accurate detection of mitochondrial heteroplasmy and for the identification of variants associated with the metabolome. The strongest association was found for mt715G > A located in the MT-12SrRNA with the metabolite ratio of C2/C10:1 (P-value = 6.82*10-09, ß = 0.909). The second most significant mtSNV was found for mt3714A > G located in the MT-ND1 with the metabolite ratio of phosphatidylcholine (PC) ae C42:5/PC ae C44:5 (P-value = 1.02*10-08, ß = 3.631). A large number of significant metabolite ratios were observed involving PC aa C36:6 and the variant mt10689G > A, located in the MT-ND4L gene. These results show an important interconnection between mitochondria and metabolite concentrations. Considering that some of the significant metabolites found in this study have been previously related to complex diseases, such as neurological disorders and metabolic conditions, these associations found here might play a crucial role for further investigations of such complex diseases. Understanding the mechanisms that control human health and disease, in particular, the role of genetic predispositions and their interaction with environmental factors is a prerequisite for the development of safe and efficient therapies for complex disorders.


Assuntos
Estudo de Associação Genômica Ampla , Metabolômica , Biomarcadores/metabolismo , DNA Mitocondrial/genética , DNA Mitocondrial/metabolismo , Humanos , Metabolômica/métodos , Mitocôndrias/genética , Mitocôndrias/metabolismo , Nucleotídeos/metabolismo , Fosfatidilcolinas/metabolismo
17.
Nat Commun ; 12(1): 6031, 2021 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-34654805

RESUMO

Fibromuscular dysplasia (FMD) is an arteriopathy associated with hypertension, stroke and myocardial infarction, affecting mostly women. We report results from the first genome-wide association meta-analysis of six studies including 1556 FMD cases and 7100 controls. We find an estimate of SNP-based heritability compatible with FMD having a polygenic basis, and report four robustly associated loci (PHACTR1, LRP1, ATP2B1, and LIMA1). Transcriptome-wide association analysis in arteries identifies one additional locus (SLC24A3). We characterize open chromatin in arterial primary cells and find that FMD associated variants are located in arterial-specific regulatory elements. Target genes are broadly involved in mechanisms related to actin cytoskeleton and intracellular calcium homeostasis, central to vascular contraction. We find significant genetic overlap between FMD and more common cardiovascular diseases and traits including blood pressure, migraine, intracranial aneurysm, and coronary artery disease.


Assuntos
Doenças Cardiovasculares/complicações , Doenças Cardiovasculares/genética , Displasia Fibromuscular/complicações , Displasia Fibromuscular/genética , Estudo de Associação Genômica Ampla , Adulto , Artérias , Proteínas do Citoesqueleto/genética , Feminino , Fibroblastos , Regulação da Expressão Gênica , Humanos , Aneurisma Intracraniano , Proteína-1 Relacionada a Receptor de Lipoproteína de Baixa Densidade/genética , Masculino , Proteínas dos Microfilamentos/genética , Pessoa de Meia-Idade , ATPases Transportadoras de Cálcio da Membrana Plasmática/genética , Trocador de Sódio e Cálcio/genética , Transcriptoma
19.
Genetics ; 217(4)2021 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-33686438

RESUMO

Genotype imputation is an indispensable step in human genetic studies. Large reference panels with deeply sequenced genomes now allow interrogating variants with minor allele frequency < 1% without sequencing. Although it is critical to consider limits of this approach, imputation methods for rare variants have only done so empirically; the theoretical basis of their imputation accuracy has not been explored. To provide theoretical consideration of imputation accuracy under the current imputation framework, we develop a coalescent model of imputing rare variants, leveraging the joint genealogy of the sample to be imputed and reference individuals. We show that broadly used imputation algorithms include model misspecifications about this joint genealogy that limit the ability to correctly impute rare variants. We develop closed-form solutions for the probability distribution of this joint genealogy and quantify the inevitable error rate resulting from the model misspecification across a range of allele frequencies and reference sample sizes. We show that the probability of a falsely imputed minor allele decreases with reference sample size, but the proportion of falsely imputed minor alleles mostly depends on the allele count in the reference sample. We summarize the impact of this error on genotype imputation on association tests by calculating the r2 between imputed and true genotype and show that even when modeling other sources of error, the impact of the model misspecification has a significant impact on the r2 of rare variants. To evaluate these predictions in practice, we compare the imputation of the same dataset across imputation panels of different sizes. Although this empirical imputation accuracy is substantially lower than our theoretical prediction, modeling misspecification seems to further decrease imputation accuracy for variants with low allele counts in the reference. These results provide a framework for developing new imputation algorithms and for interpreting rare variant association analyses.


Assuntos
Frequência do Gene , Genoma Humano , Modelos Genéticos , Polimorfismo Genético , Algoritmos , Genética Populacional/métodos , Humanos
20.
Am J Hum Genet ; 108(4): 669-681, 2021 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-33730541

RESUMO

Tests of association between a phenotype and a set of genes in a biological pathway can provide insights into the genetic architecture of complex phenotypes beyond those obtained from single-variant or single-gene association analysis. However, most existing gene set tests have limited power to detect gene set-phenotype association when a small fraction of the genes are associated with the phenotype and cannot identify the potentially "active" genes that might drive a gene set-based association. To address these issues, we have developed Gene set analysis Association Using Sparse Signals (GAUSS), a method for gene set association analysis that requires only GWAS summary statistics. For each significantly associated gene set, GAUSS identifies the subset of genes that have the maximal evidence of association and can best account for the gene set association. Using pre-computed correlation structure among test statistics from a reference panel, our p value calculation is substantially faster than other permutation- or simulation-based approaches. In simulations with varying proportions of causal genes, we find that GAUSS effectively controls type 1 error rate and has greater power than several existing methods, particularly when a small proportion of genes account for the gene set signal. Using GAUSS, we analyzed UK Biobank GWAS summary statistics for 10,679 gene sets and 1,403 binary phenotypes. We found that GAUSS is scalable and identified 13,466 phenotype and gene set association pairs. Within these gene sets, we identify an average of 17.2 (max = 405) genes that underlie these gene set associations.


Assuntos
Bancos de Espécimes Biológicos , Interpretação Estatística de Dados , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Transportadores de Cassetes de Ligação de ATP/genética , Simulação por Computador , Expressão Gênica/genética , Humanos , Projetos de Pesquisa , Fatores de Tempo , Reino Unido , Navegador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA